Challenge: Nova, A pharmaceutical company, lacks bsuiness insghts on how their advertising spending influences the sales, from their data. Nova needs to allocate its budget wisely to maximize sales but lacks insight into the most effective strategy.
Objectives:
Strategy:
Expected Outcomes:
Benefits:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv(r"C:\Users\Teni\Desktop\Advertising.csv")
# data for year 2022 and 2023
df.head(10)
# This record shows us how much Nova has spent on TV, radio and newspaper in the year 2023.
# As well as the total_sales made at each Ad attempt.
Youtube | Podcasts | generated_sales | ||
---|---|---|---|---|
0 | 230.1 | 37.8 | 69.2 | 22.1 |
1 | 44.5 | 39.3 | 45.1 | 10.4 |
2 | 17.2 | 45.9 | 69.3 | 9.3 |
3 | 151.5 | 41.3 | 58.5 | 18.5 |
4 | 180.8 | 10.8 | 58.4 | 12.9 |
5 | 8.7 | 48.9 | 75.0 | 7.2 |
6 | 57.5 | 32.8 | 23.5 | 11.8 |
7 | 120.2 | 19.6 | 11.6 | 13.2 |
8 | 8.6 | 2.1 | 1.0 | 4.8 |
9 | 199.8 | 2.6 | 21.2 | 10.6 |
# To help us determine how much has been spent on Ad in general and the resultant sales, an additional column-summing up total costs- with will be created
df['total_AdSpend'] = df['Instagram'] + df['Youtube'] + df['Podcasts']
df
Youtube | Podcasts | generated_sales | total_AdSpend | ||
---|---|---|---|---|---|
0 | 230.1 | 37.8 | 69.2 | 22.1 | 337.1 |
1 | 44.5 | 39.3 | 45.1 | 10.4 | 128.9 |
2 | 17.2 | 45.9 | 69.3 | 9.3 | 132.4 |
3 | 151.5 | 41.3 | 58.5 | 18.5 | 251.3 |
4 | 180.8 | 10.8 | 58.4 | 12.9 | 250.0 |
... | ... | ... | ... | ... | ... |
195 | 38.2 | 3.7 | 13.8 | 7.6 | 55.7 |
196 | 94.2 | 4.9 | 8.1 | 9.7 | 107.2 |
197 | 177.0 | 9.3 | 6.4 | 12.8 | 192.7 |
198 | 283.6 | 42.0 | 66.2 | 25.5 | 391.8 |
199 | 232.1 | 8.6 | 8.7 | 13.4 | 249.4 |
200 rows × 5 columns
plt.figure(figsize=[10, 6], dpi= 200)
sns.scatterplot(data=df, x = 'total_AdSpend', y = 'generated_sales');
# This shows there's an existing positive Linear relationship between the total cost spent on Ads and the sales generated by Nova
Defining the Coefficient Variable
X = df['total_AdSpend']
y = df['generated_sales']
# X is the Feature (Independent Variable)
# y is the Label (Dependent Variable)
# From the Linear Regression equationy = mx + b
# We using the deg 1 because the relationshop is linear
np.polyfit(X, y, deg=1)
# This form the coefficients for the m and b
array([0.04868788, 4.24302822])
Hypothetical Determinant
projected_cost = np.linspace(2, 340, 300)
# Here we have an array of 300 numbers spaced from 2 to 340.
# Using the y = mx + b
projected_sales = 0.04868788*projected_cost + 4.24302822
Q3_AdDeterminant = pd.DataFrame({'Ad_costs': projected_cost, 'Projected_sales': projected_sales})
Q3_AdDeterminant
Ad_costs | Projected_sales | |
---|---|---|
0 | 2.000000 | 4.340404 |
1 | 3.130435 | 4.395442 |
2 | 4.260870 | 4.450481 |
3 | 5.391304 | 4.505519 |
4 | 6.521739 | 4.560558 |
... | ... | ... |
295 | 335.478261 | 20.576754 |
296 | 336.608696 | 20.631792 |
297 | 337.739130 | 20.686830 |
298 | 338.869565 | 20.741869 |
299 | 340.000000 | 20.796907 |
300 rows × 2 columns
sns.scatterplot(data=df, x = 'total_AdSpend', y = 'generated_sales');
plt.plot(projected_cost, projected_sales, color ='purple');
If we had an Ad budget of 10k, what'd our projected sales value be?
spend = 10
sales = 0.04868788*spend + 4.24302822
sales
4.729907020000001